Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 64
Filtrar
1.
J Transl Med ; 22(1): 122, 2024 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-38297333

RESUMO

BACKGROUND: Emerging evidence suggests that Rho GTPases play a crucial role in tumorigenesis and metastasis, but their involvement in the tumor microenvironment (TME) and prognosis of hepatocellular carcinoma (HCC) is not well understood. METHODS: We aim to develop a tumor prognosis prediction system called the Rho GTPases-related gene score (RGPRG score) using Rho GTPase signaling genes and further bioinformatic analyses. RESULTS: Our work found that HCC patients with a high RGPRG score had significantly worse survival and increased immunosuppressive cell fractions compared to those with a low RGPRG score. Single-cell cohort analysis revealed an immune-active TME in patients with a low RGPRG score, with strengthened communication from T/NK cells to other cells through MIF signaling networks. Targeting these alterations in TME, the patients with high RGPRG score have worse immunotherapeutic outcomes and decreased survival time in the immunotherapy cohort. Moreover, the RGPRG score was found to be correlated with survival in 27 other cancers. In vitro experiments confirmed that knockdown of the key Rho GTPase-signaling biomarker SFN significantly inhibited HCC cell proliferation, invasion, and migration. CONCLUSIONS: This study provides new insight into the TME features and clinical use of Rho GTPase gene pattern at the bulk-seq and single-cell level, which may contribute to guiding personalized treatment and improving clinical outcome in HCC.


Assuntos
Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/genética , Neoplasias Hepáticas/genética , Carcinogênese , Linhagem Celular , Imunossupressores , Proteínas rho de Ligação ao GTP , Microambiente Tumoral
2.
Bioinform Adv ; 4(1): vbae006, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38282975

RESUMO

Summary: Third-generation long-read sequencing is an increasingly utilized technique for profiling human immunodeficiency virus (HIV) quasispecies and detecting drug resistance mutations due to its ability to cover the entire viral genome in individual reads. Recently, the ClusterV tool has demonstrated accurate detection of HIV quasispecies from Nanopore long-read sequencing data. However, the need for scripting skills and a computational environment may act as a barrier for many potential users. To address this issue, we have introduced ClusterV-Web, a user-friendly web-based application that enables easy configuration and execution of ClusterV, both remotely and locally. Our tool provides interactive tables and data visualizations to aid in the interpretation of results. This development is expected to democratize access to long-read sequencing data analysis, enabling a wider range of researchers and clinicians to efficiently profile HIV quasispecies and detect drug resistance mutations. Availability and implementation: ClusterV-Web is freely available and open source, with detailed documentation accessible at http://www.bio8.cs.hku.hk/ClusterVW/. The standalone Docker image and source code are also available at https://github.com/HKU-BAL/ClusterV-Web.

4.
Stem Cell Res Ther ; 14(1): 247, 2023 09 13.
Artigo em Inglês | MEDLINE | ID: mdl-37705079

RESUMO

AIMS: Dissecting complex interactions among transcription factors (TFs), microRNAs (miRNAs) and long noncoding RNAs (lncRNAs) are central for understanding heart development and function. Although computational approaches and platforms have been described to infer relationships among regulatory factors and genes, current approaches do not adequately account for how highly diverse, interacting regulators that include noncoding RNAs (ncRNAs) control cardiac gene expression dynamics over time. METHODS: To overcome this limitation, we devised an integrated framework, cardiac gene regulatory modeling (CGRM) that integrates LogicTRN and regulatory component analysis bioinformatics modeling platforms to infer complex regulatory mechanisms. We then used CGRM to identify and compare the TF-ncRNA gene regulatory networks that govern early- and late-stage cardiomyocytes (CMs) generated by in vitro differentiation of human pluripotent stem cells (hPSC) and ventricular and atrial CMs isolated during in vivo human cardiac development. RESULTS: Comparisons of in vitro versus in vivo derived CMs revealed conserved regulatory networks among TFs and ncRNAs in early cells that significantly diverged in late staged cells. We report that cardiac genes ("heart targets") expressed in early-stage hPSC-CMs are primarily regulated by MESP1, miR-1, miR-23, lncRNAs NEAT1 and MALAT1, while GATA6, HAND2, miR-200c, NEAT1 and MALAT1 are critical for late hPSC-CMs. The inferred TF-miRNA-lncRNA networks regulating heart development and contraction were similar among early-stage CMs, among individual hPSC-CM datasets and between in vitro and in vivo samples. However, genes related to apoptosis, cell cycle and proliferation, and transmembrane transport showed a high degree of divergence between in vitro and in vivo derived late-stage CMs. Overall, late-, but not early-stage CMs diverged greatly in the expression of "heart target" transcripts and their regulatory mechanisms. CONCLUSIONS: In conclusion, we find that hPSC-CMs are regulated in a cell autonomous manner during early development that diverges significantly as a function of time when compared to in vivo derived CMs. These findings demonstrate the feasibility of using CGRM to reveal dynamic and complex transcriptional and posttranscriptional regulatory interactions that underlie cell directed versus environment-dependent CM development. These results with in vitro versus in vivo derived CMs thus establish this approach for detailed analyses of heart disease and for the analysis of cell regulatory systems in other biomedical fields.


Assuntos
MicroRNAs , RNA Longo não Codificante , Humanos , RNA Longo não Codificante/genética , Fatores de Transcrição/genética , MicroRNAs/genética , Miócitos Cardíacos , Ventrículos do Coração
5.
BMC Bioinformatics ; 24(1): 308, 2023 Aug 03.
Artigo em Inglês | MEDLINE | ID: mdl-37537536

RESUMO

BACKGROUND: With the continuous advances in third-generation sequencing technology and the increasing affordability of next-generation sequencing technology, sequencing data from different sequencing technology platforms is becoming more common. While numerous benchmarking studies have been conducted to compare variant-calling performance across different platforms and approaches, little attention has been paid to the potential of leveraging the strengths of different platforms to optimize overall performance, especially integrating Oxford Nanopore and Illumina sequencing data. RESULTS: We investigated the impact of multi-platform data on the performance of variant calling through carefully designed experiments with a deep learning-based variant caller named Clair3-MP (Multi-Platform). Through our research, we not only demonstrated the capability of ONT-Illumina data for improved variant calling, but also identified the optimal scenarios for utilizing ONT-Illumina data. In addition, we revealed that the improvement in variant calling using ONT-Illumina data comes from an improvement in difficult genomic regions, such as the large low-complexity regions and segmental and collapse duplication regions. Moreover, Clair3-MP can incorporate reference genome stratification information to achieve a small but measurable improvement in variant calling. Clair3-MP is accessible as an open-source project at: https://github.com/HKU-BAL/Clair3-MP . CONCLUSIONS: These insights have important implications for researchers and practitioners alike, providing valuable guidance for improving the reliability and efficiency of genomic analysis in diverse applications.


Assuntos
Genoma , Genômica , Reprodutibilidade dos Testes , Sequenciamento de Nucleotídeos em Larga Escala
6.
Clin Chem ; 69(10): 1174-1185, 2023 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-37537871

RESUMO

BACKGROUND: HIV infections often develop drug resistance mutations (DRMs), which can increase the risk of virological failure. However, it has been difficult to determine if minor mutations occur in the same genome or in different virions using Sanger sequencing and short-read sequencing methods. Oxford Nanopore Technologies (ONT) sequencing may improve antiretroviral resistance profiling by allowing for long-read clustering. METHODS: A new ONT sequencing-based method for profiling DRMs in HIV quasispecies was developed and validated. The method used hierarchical clustering of long amplicons that cover regions associated with different types of antiretroviral drugs. A gradient series of an HIV plasmid and 2 plasma samples was prepared to validate the clustering performance. The ONT results were compared to those obtained with Sanger sequencing and Illumina sequencing in 77 HIV-positive plasma samples to evaluate the diagnostic performance. RESULTS: In the validation study, the abundance of detected quasispecies was concordant with the predicted result with the R2 of > 0.99. During the diagnostic evaluation, 59/77 samples were successfully sequenced for DRMs. Among 18 failed samples, 17 were below the limit of detection of 303.9 copies/µL. Based on the receiver operating characteristic analysis, the ONT workflow achieved an F1 score of 0.96 with a cutoff of 0.4 variant allele frequency. Four cases were found to have quasispecies with DRMs, in which 2 harbored quasispecies with more than one class of DRMs. Treatment modifications were recommended for these cases. CONCLUSIONS: Long-read sequencing coupled with hierarchical clustering could differentiate the quasispecies resistance profiles in HIV-infected samples, providing a clearer picture for medical care.


Assuntos
Infecções por HIV , HIV-1 , Humanos , Infecções por HIV/tratamento farmacológico , Quase-Espécies/genética , HIV-1/genética , Antirretrovirais/farmacologia , Antirretrovirais/uso terapêutico , Mutação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise por Conglomerados
7.
Sci Rep ; 13(1): 5237, 2023 03 31.
Artigo em Inglês | MEDLINE | ID: mdl-37002338

RESUMO

Sensitive detection of Mycobacterium tuberculosis (TB) in small percentages in metagenomic samples is essential for microbial classification and drug resistance prediction. However, traditional methods, such as bacterial culture and microscopy, are time-consuming and sometimes have limited TB detection sensitivity. Oxford nanopore technologies (ONT) MinION sequencing allows rapid and simple sample preparation for sequencing. Its recently developed adaptive sequencing selects reads from targets while allowing real-time base-calling to achieve sequence enrichment or depletion during sequencing. Another common enrichment method is PCR amplification of the target TB genes. In this study, we compared both methods using ONT MinION sequencing for TB detection and variant calling in metagenomic samples using both simulation runs and those with synthetic and patient samples. We found that both methods effectively enrich TB reads from a high percentage of human (95%) and other microbial DNA. Adaptive sequencing with readfish and UNCALLDE achieved a 3.9-fold and 2.2-fold enrichment compared to the control run. We provide a simple automatic analysis framework to support the detection of TB for clinical use, openly available at https://github.com/HKU-BAL/ONT-TB-NF . Depending on the patient's medical condition and sample type, we recommend users evaluate and optimize their workflow for different clinical specimens to improve the detection limit.


Assuntos
Mycobacterium tuberculosis , Nanoporos , Humanos , Mycobacterium tuberculosis/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Metagenoma , Simulação por Computador , Análise de Sequência de DNA
8.
Genome Med ; 15(1): 10, 2023 02 14.
Artigo em Inglês | MEDLINE | ID: mdl-36788602

RESUMO

BACKGROUND: Very low-coverage (0.1 to 1×) whole genome sequencing (WGS) has become a promising and affordable approach to discover genomic variants of human populations for genome-wide association study (GWAS). To support genetic screening using preimplantation genetic testing (PGT) in a large population, the sequencing coverage goes below 0.1× to an ultra-low level. However, the feasibility and effectiveness of ultra-low-coverage WGS (ulcWGS) for GWAS remains undetermined. METHODS: We built a pipeline to carry out analysis of ulcWGS data for GWAS. To examine its effectiveness, we benchmarked the accuracy of genotype imputation at the combination of different coverages below 0.1× and sample sizes from 2000 to 16,000, using 17,844 embryo PGT samples with approximately 0.04× average coverage and the standard Chinese sample HG005 with known genotypes. We then applied the imputed genotypes of 1744 transferred embryos who have gestational ages and complete follow-up records to GWAS. RESULTS: The accuracy of genotype imputation under ultra-low coverage can be improved by increasing the sample size and applying a set of filters. From 1744 born embryos, we identified 11 genomic risk loci associated with gestational ages and 166 genes mapped to these loci according to positional, expression quantitative trait locus, and chromatin interaction strategies. Among these mapped genes, CRHBP, ICAM1, and OXTR were more frequently reported as preterm birth related. By joint analysis of gene expression data from previous studies, we constructed interrelationships of mainly CRHBP, ICAM1, PLAGL1, DNMT1, CNTLN, DKK1, and EGR2 with preterm birth, infant disease, and breast cancer. CONCLUSIONS: This study not only demonstrates that ulcWGS could achieve relatively high accuracy of adequate genotype imputation and is capable of GWAS, but also provides insights into the associations between gestational age and genetic variations of the fetal embryos from Chinese population.


Assuntos
Estudo de Associação Genômica Ampla , Nascimento Prematuro , Recém-Nascido , Feminino , Humanos , Idade Gestacional , Polimorfismo de Nucleotídeo Único , Testes Genéticos , Genótipo , Locos de Características Quantitativas
9.
BMC Bioinformatics ; 23(1): 465, 2022 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-36344913

RESUMO

BACKGROUND: Whole genome sequencing using the long-read Oxford Nanopore Technologies (ONT) MinION sequencer provides a cost-effective option for structural variant (SV) detection in clinical applications. Despite the advantage of using long reads, however, accurate SV calling and phasing are still challenging. RESULTS: We introduce Duet, an SV detection tool optimized for SV calling and phasing using ONT data. The tool uses novel features integrated from both SV signatures and single-nucleotide polymorphism signatures, which can accurately distinguish SV haplotype from a false signal. Duet was benchmarked against state-of-the-art tools on multiple ONT sequencing datasets of sequencing coverage ranging from 8× to 40×. At low sequencing coverage of 8×, Duet performs better than all other tools in SV calling, SV genotyping and SV phasing. When the sequencing coverage is higher (20× to 40×), the F1-score for SV phasing is further improved in comparison to the performance of other tools, while its performance of SV genotyping and SV calling remains higher than other tools. CONCLUSION: Duet can perform accurate SV calling, SV genotyping and SV phasing using low-coverage ONT data, making it very useful for low-coverage genomes. It has great performance when scaled to high-coverage genomes, which is adaptable to various clinical applications. Duet is open source and is available at https://github.com/yekaizhou/duet .


Assuntos
Sequenciamento por Nanoporos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala , Sequenciamento Completo do Genoma
10.
DNA Res ; 29(6)2022 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-36308393

RESUMO

DNA sequences that are absent in the human reference genome are classified as novel sequences. The discovery of these missed sequences is crucial for exploring the genomic diversity of populations and understanding the genetic basis of human diseases. However, various DNA lengths of reads generated from different sequencing technologies can significantly affect the results of novel sequences. In this work, we designed an assembly-free novel sequence (AF-NS) approach to identify novel sequences from Oxford Nanopore Technology long reads. Among the newly detected sequences using AF-NS, more than 95% were omitted from those using long-read assemblers and 85% were not present in short reads of Illumina. We identified the common novel sequences among all the samples and revealed their association with the binding motifs of transcription factors. Regarding the placements of the novel sequences, we found about 70% enriched in repeat regions and generated 430 for one specific subpopulation that might be related to their evolution. Our study demonstrates the advance of the assembly-free approach to capture more novel sequences over other assembler based methods. Combining the long-read data with powerful analytical methods can be a robust way to improve the completeness of novel sequences.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Nanoporos , Humanos , Análise de Sequência de DNA/métodos , Sequência de Bases , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Genômica
11.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35849103

RESUMO

Accurate identification of genetic variants from family child-mother-father trio sequencing data is important in genomics. However, state-of-the-art approaches treat variant calling from trios as three independent tasks, which limits their calling accuracy for Nanopore long-read sequencing data. For better trio variant calling, we introduce Clair3-Trio, the first variant caller tailored for family trio data from Nanopore long-reads. Clair3-Trio employs a Trio-to-Trio deep neural network model, which allows it to input the trio sequencing information and output all of the trio's predicted variants within a single model to improve variant calling. We also present MCVLoss, a novel loss function tailor-made for variant calling in trios, leveraging the explicit encoding of the Mendelian inheritance. Clair3-Trio showed comprehensive improvement in experiments. It predicted far fewer Mendelian inheritance violation variations than current state-of-the-art methods. We also demonstrated that our Trio-to-Trio model is more accurate than competing architectures. Clair3-Trio is accessible as a free, open-source project at https://github.com/HKU-BAL/Clair3-Trio.


Assuntos
Nanoporos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Redes Neurais de Computação , Análise de Sequência de DNA , Software
12.
Microbiol Spectr ; 10(3): e0001422, 2022 06 29.
Artigo em Inglês | MEDLINE | ID: mdl-35510851

RESUMO

Pet bite-related infections are commonly caused by the pet's oral flora transmitted to the animal handlers through the bite wounds. In this study, we isolated a streptococcus, HKU75T, in pure culture from the purulent discharge collected from a guinea pig bite wound in a previously healthy young patient. HKU75T was alpha-hemolytic on sheep blood agar and agglutinated with Lancefield group D and group G antisera. API 20 STREP showed that the most likely identity for HKU75T was S. suis I with 85.4% confidence while Vitek 2 showed that HKU75T was unidentifiable. MALDI-TOF MS identified HKU75T as Streptococcus suis (score of 1.86 only). 16S rRNA gene sequencing showed that HKU75T was most closely related to S. parasuis (98.3% nucleotide identity), whereas partial groEL and rpoB gene sequencing showed that it was most closely related to S. suis (81.8% and 89.8% nucleotide identity respectively). Whole genome sequencing and intergenomic distance determined by ANI revealed that there was <85% identity between the genome of HKU75T and those of all other known Streptococcus species. Genome classification using concatenated sequences of 92 bacterial core genes showed that HKU75T belonged to the Suis group. groEL gene sequences identical to that of HKU75T could be directly amplified from the oral cavities of the two guinea pigs owned by the patient. HKU75T is a novel Streptococcus species, which we propose to be named S. oriscaviae. The oral cavity of guinea pigs is presumably a reservoir of S. oriscaviae. Some of the reported S. suis strains isolated from clinical specimens may be S. oriscaviae. IMPORTANCE We reported the discovery of a novel Streptococcus species, propose to be named Streptococcus oriscaviae, from the pus collected from a guinea pig bite wound in a healthy young patient. The bacterium was initially misidentified as S. suis/S. parasuis by biochemical tests, mass spectrometry. and housekeeping genes sequencing. Its novelty was confirmed by whole genome sequencing. Comparative genomic studies showed that S. oriscaviae belongs to the Suis group. S. oriscaviae sequences were detected in the oral cavities of the two guinea pigs owned by the patient, suggesting that the oral cavity of guinea pigs could be a reservoir of S. oriscaviae. Some of the reported S. suis strains may be S. oriscaviae. Further studies are warranted to refine our knowledge on this novel Streptococcus species.


Assuntos
Streptococcus suis , Animais , DNA Bacteriano/genética , Genes Bacterianos , Cobaias , Nucleotídeos , Filogenia , RNA Ribossômico 16S/genética , Streptococcus suis/genética
13.
Front Mol Biosci ; 9: 714008, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35402504

RESUMO

Inefficient differentiation and insufficient maturation are barriers to the application of human pluripotent stem cell (hPSC)-derived cardiomyocytes (CMs) for research and therapy. Great strides have been made to the former, and multiple groups have reported cardiac differentiation protocol that can generate hPSC-CMs at high efficiency. Although many such protocols are based on the modulation of the WNT signaling pathway, they differ in their timing and in the WNT inhibitors used. Little is currently known about whether and how conditions of differentiation affect cardiac maturation. Here we adapted multiple cardiac differentiation protocols to improve cost-effectiveness and consistency, and compared the properties of the hPSC-CMs generated. Our results showed that the schedule of differentiation, but not the choice of WNT inhibitors, was a critical determinant not only of differentiation efficiency, which was expected, but also CM maturation. Among cultures with comparable purity, hPSC-CMs generated with different differentiation schedules vary in the expression of genes which are important for developmental maturation, and in their structural, metabolic, calcium transient and proliferative properties. In summary, we demonstrated that simple changes in the schedule of cardiac differentiation could promote maturation. To this end, we have optimized a cardiac differentiation protocol that can simultaneously achieve high differentiation efficiency and enhanced developmental maturation. Our findings would advance the production of hPSC-CMs for research and therapy.

14.
Sci Rep ; 12(1): 4519, 2022 03 16.
Artigo em Inglês | MEDLINE | ID: mdl-35296758

RESUMO

Structural variation (SV) is a major cause of genetic disorders. In this paper, we show that low-depth (specifically, 4×) whole-genome sequencing using a single Oxford Nanopore MinION flow cell suffices to support sensitive detection of SV, particularly pathogenic SV for supporting clinical diagnosis. When using 4× ONT WGS data, existing SV calling software often fails to detect pathogenic SV, especially in the form of long deletion, terminal deletion, duplication, and unbalanced translocation. Our new SV calling software SENSV can achieve high sensitivity for all types of SV and a breakpoint precision typically ± 100 bp; both features are important for clinical concerns. The improvement achieved by SENSV stems from several new algorithms. We evaluated SENSV and other software using both real and simulated data. The former was based on 24 patient samples, each diagnosed with a genetic disorder. SENSV found the pathogenic SV in 22 out of 24 cases (all heterozygous, size from hundreds of kbp to a few Mbp), reporting breakpoints within 100 bp of the true answers. On the other hand, no existing software can detect the pathogenic SV in more than 10 out of 24 cases, even when the breakpoint requirement is relaxed to ± 2000 bp.


Assuntos
Nanoporos , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA , Software , Translocação Genética , Sequenciamento Completo do Genoma
15.
JCO Precis Oncol ; 6: e2100365, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35235413

RESUMO

PURPOSE: Mitogen-activated protein kinase pathway-activating mutations occur in the majority of colorectal cancer (CRC) cases and show mutual exclusivity. We identified 47 epidermal growth factor receptor/BRAF inhibitor-naive CRC patients with dual RAS hotspot/BRAF V600E mutations (CRC-DD) from a cohort of 4,561 CRC patients with clinical next-generation sequencing results. We aimed to define the molecular phenotypes of the CRC-DD and to test if the dual RAS hotspot/BRAF V600E mutations coexist within the same cell. MATERIALS AND METHODS: We developed a single-cell genotyping method with a mutation detection rate of 96.3% and a genotype prediction accuracy of 92.1%. Mutations in the CRC-DD cohort were analyzed for clonality, allelic imbalance, copy number, and overall survival. RESULTS: Application of single-cell genotyping to four CRC-DD revealed the co-occurrence of both mutations in the following percentages of cells per case: NRAS G13D/KRAS G12C, 95%; KRAS G12D/NRAS G12V, 48%; BRAF V600E/KRAS G12D, 44%; and KRAS G12D/NRAS G13V, 14%, respectively. Allelic imbalance favoring the oncogenic allele was less frequent in CRC-DD (24 of 76, 31.5%, somatic mutations) compared with a curated cohort of CRC with a single-driver mutation (CRC-SD; 119 of 232 mutations, 51.3%; P = .013). Microsatellite instability-high status was enriched in CRC-DD compared with CRC-SD (23% v 11.4%, P = .028). Of the seven CRC-DD cases with multiregional sequencing, five retained both driver mutations throughout all sequenced tumor sites. Both CRC-DD cases with discordant multiregional sequencing were microsatellite instability-high. CONCLUSION: Our findings indicate that dual-driver mutations occur in a rare subset of CRC, often within the same tumor cells and across multiple tumor sites. Their presence and a lower rate of allelic imbalance may be related to dose-dependent signaling within the mitogen-activated protein kinase pathway.


Assuntos
Neoplasias Colorretais , Proteínas Proto-Oncogênicas B-raf , Neoplasias Colorretais/genética , Humanos , Instabilidade de Microssatélites , Proteínas Quinases Ativadas por Mitógeno/genética , Mutação/genética , Proteínas Proto-Oncogênicas B-raf/genética , Proteínas Proto-Oncogênicas p21(ras)/genética
16.
BMC Med Genomics ; 15(1): 43, 2022 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-35246132

RESUMO

BACKGROUND: The application of long-read sequencing using the Oxford Nanopore Technologies (ONT) MinION sequencer is getting more diverse in the medical field. Having a high sequencing error of ONT and limited throughput from a single MinION flowcell, however, limits its applicability for accurate variant detection. Medical exome sequencing (MES) targets clinically significant exon regions, allowing rapid and comprehensive screening of pathogenic variants. By applying MES with MinION sequencing, the technology can achieve a more uniform capture of the target regions, shorter turnaround time, and lower sequencing cost per sample. METHOD: We introduced a cost-effective optimized workflow, ECNano, comprising a wet-lab protocol and bioinformatics analysis, for accurate variant detection at 4800 clinically important genes and regions using a single MinION flowcell. The ECNano wet-lab protocol was optimized to perform long-read target enrichment and ONT library preparation to stably generate high-quality MES data with adequate coverage. The subsequent variant-calling workflow, Clair-ensemble, adopted a fast RNN-based variant caller, Clair, and was optimized for target enrichment data. To evaluate its performance and practicality, ECNano was tested on both reference DNA samples and patient samples. RESULTS: ECNano achieved deep on-target depth of coverage (DoC) at average > 100× and > 98% uniformity using one MinION flowcell. For accurate ONT variant calling, the generated reads sufficiently covered 98.9% of pathogenic positions listed in ClinVar, with 98.96% having at least 30× DoC. ECNano obtained an average read length of 1000 bp. The long reads of ECNano also covered the adjacent splice sites well, with 98.5% of positions having ≥ 30× DoC. Clair-ensemble achieved > 99% recall and accuracy for SNV calling. The whole workflow from wet-lab protocol to variant detection was completed within three days. CONCLUSION: We presented ECNano, an out-of-the-box workflow comprising (1) a wet-lab protocol for ONT target enrichment sequencing and (2) a downstream variant detection workflow, Clair-ensemble. The workflow is cost-effective, with a short turnaround time for high accuracy variant calling in 4800 clinically significant genes and regions using a single MinION flowcell. The long-read exon captured data has potential for further development, promoting the application of long-read sequencing in personalized disease treatment and risk prediction.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Nanoporos , Análise Custo-Benefício , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de DNA/métodos , Fluxo de Trabalho
17.
NAR Genom Bioinform ; 4(1): lqac005, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35156024

RESUMO

HKG is the first fully accessible variant database for Hong Kong Cantonese, constructed from 205 novel whole-exome sequencing data. There has long been a research gap in the understanding of the genetic architecture of southern Chinese subgroups, including Hong Kong Cantonese. HKG detected 196 325 high-quality variants with 5.93% being novel, and 25 472 variants were found to be unique in HKG compared to three Chinese populations sampled from 1000 Genomes (CHN). PCA illustrates the uniqueness of HKG in CHN, and the admixture study estimated the ancestral composition of HKG and CHN, with a gradient change from north to south, consistent with their geological distribution. ClinVar, CIViC and PharmGKB annotated 599 clinically significant variants and 360 putative loss-of-function variants, substantiating our understanding of population characteristics for future medical development. Among the novel variants, 96.57% were singleton and 6.85% were of high impact. With a good representation of Hong Kong Cantonese, we demonstrated better variant imputation using reference with the addition of HKG data, thus successfully filling the data gap in southern Chinese to facilitate the regional and global development of population genetics.

18.
Nat Comput Sci ; 2(12): 797-803, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38177392

RESUMO

Deep learning-based variant callers are becoming the standard and have achieved superior single nucleotide polymorphisms calling performance using long reads. Here we present Clair3, which leverages two major method categories: pileup calling handles most variant candidates with speed, and full-alignment tackles complicated candidates to maximize precision and recall. Clair3 runs faster than any of the other state-of-the-art variant callers and demonstrates improved performance, especially at lower coverage.


Assuntos
Aprendizado Profundo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único/genética
20.
Innovation (Camb) ; 2(4): 100153, 2021 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-34901902

RESUMO

The Human Genome Project opened an era of (epi)genomic research, and also provided a platform for the development of new sequencing technologies. During and after the project, several sequencing technologies continue to dominate nucleic acid sequencing markets. Currently, Illumina (short-read), PacBio (long-read), and Oxford Nanopore (long-read) are the most popular sequencing technologies. Unlike PacBio or the popular short-read sequencers before it, which, as examples of the second or so-called Next-Generation Sequencing platforms, need to synthesize when sequencing, nanopore technology directly sequences native DNA and RNA molecules. Nanopore sequencing, therefore, avoids converting mRNA into cDNA molecules, which not only allows for the sequencing of extremely long native DNA and full-length RNA molecules but also document modifications that have been made to those native DNA or RNA bases. In this review on direct DNA sequencing and direct RNA sequencing using Oxford Nanopore technology, we focus on their development and application achievements, discussing their challenges and future perspective. We also address the problems researchers may encounter applying these approaches in their research topics, and how to resolve them.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...